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1.   Introduction 

Statistical  models  of  unobserved  components  seem  destined  for  an 
increasing  role  in  econometric  work.   Especially  in  cross-sections,  the 
differences  in  the  values  of  the  left-hand  variables  among  observations 
with  identical  values  of  the  right-hand  variables  are  sufficiently  large 
to  justify  careful  analysis  of  the  apparently  random  component  of  the 
behavior  under  study.   The  simple  characterization  of  randomness  implicit 
in  the  stochastic  specification  of  the  regression  model  seems  inadequate 
when  the  right-hand  variables  in  a  problem  account  for  only  a  small  portion 
of  the  dispersion  of  the  left-hand  variable.  Many  recent  authors  have 
sought  to  attribute  part  of  the  randomness  in  their  samples  to  variations 
within  the  population  of  characteristics  that  are  not  observed.   For 
example,  Griliches  (1973)  assigns  part  of  the  dispersion  of  earnings 
conditional  on  education  to  the  unobserved  differences  in  ability  of 
individuals  with  equal  amounts  of  education.   Domencich  and  McFadden  (1974) 
hypothesize  a  distribution  of  tastes  within  the  population  to  explain 
choices  of  modes  of  transportation  by  individual  commuters.   The  present 
paper  takes  up  the  following  question:   What  can  be  discovered  about  the 
underlying  distribution  of  characteristics  from  the  observed  body  of  data? 
Are  the  assumptions  about  the  distributions  of  unobserved  characteristics 
made  by  previous  authors  verifiable,  or  must  they  be  accepted  on  pure  faith? 

A  general  statistical  model  suitable  for  this  discussion  is  the 
following 
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y  =  h(x,e,u)  (1.1) 

where  y  is  the  scalar  left-hand  variable,  assumed  to  be  qualitative 
(taking  on  only  a  finite  number  of  integer  values  ) ,  x  is  a  vector  of 
observed  characteristics,  6  is  the  unobserved  characteristic,  and  u  is 
a  disturbance  whose  distribution  may  depend  on  x  and  6.   Apart  from  the 
presence  of  0,  this  would  be  a  regression  model  if  the  distribution  of 
u  did  not  depend  on  x  and  6;  in  the  qualitative  case  especially,  however, 
this  dependence  is  critical.   Uur  discussion  concerns  the  untangling  of 
the  separate  effects  of  6  and  u,  vjhere  the  role  of  x  is  subsidiary,  so 
until  Section  6  we  con3ider  the  case  of  sampling  from  a  population  whose 
members  are  observationally  identical ,  where  it  is  appropriate  to  suppress 
x: 

y  =  h(e,u)  (1.2) 

All  observations  from  the  same  individual  are  assumed  to  correspond  to 

the  same  Q,  but  each  one  involves  a  new  drawing  from  the  distribution 

of  u.   Finally,  we  assume  prior  knowledge  of  h(e,u)  and  of  the  distribution 

of  u.   The  last  assumption  should  become  more  plausible  as  the  discussion 

progresses. 

Models  of  unobserved  components  are  particularly  important  in 
the  study  of  the  distribution  of  income.   The  major  theme  of  the  most 
influential  recent  vrork   on  income  distribution,  Christopher  Jenck's  book. 
Inequality  (1972) ,  is  exactly  that  observed  differences  among  individuals 
account  for  very  little  of  the  dispersion  of  income  among  them:   "Neither 


If  the  left-hand  variable  is  continuous,  y  can  be  defined  by  a 
set  of  intervals  of  values  of  the  variable. 
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family  background,  cognitive  skill,  educational  attainment,  nor 
occupational  status  explains  much  of  the  variation  in  men's  incomes. 
Indeed,  when  we  compare  men  who  are  identical  in  all  these  respects,  xvre 
find  only  12  to  15  percent  less  inequality  than  among  random  individuals. 
How  are  we  to  explain  these  variations  among  men  who  seem  to  be  similarly 
situated?"  (p.  227).   Jencks  replies  that  unmeasured  differences  in 
motivation,  ability,  and  especially  luck  account  for  the  bulk  of  the 
dispersion  in  income.   His  discussion  is  limited  bj'  his  failure  to 
distinguish  between  unobserved  differences  among  individuals,  on  the  one 
hand,  and  differences  in  the  experience  of  the  same  individual  at 
different  points  in  time,  on  the  other.   In  the  context  of  measuring 
income,  this  distinction  is  familiar  to  economists  in  Milton  Friedman's 
notion  of  the  permanent  and  transitory  components  of  measured  income. 
Jencks  alludes  briefly  to  the  distribution  of  permanent  income  (footnote 
1,  p.  233)  but  the  distinction  has  no  role  in  his  discussion. 

The  class  of  statistical  models  studied  here  provides  a  general 
framework  for  separating  the  two  sources  of  the  apparently  random 
differences  among  individuals  at  a  point  in  time.   Systematic  differences 
among  individuals  are  indexed  by  the  random  variable  9 ,  and  differences 
in  the  experiences  of  a  single  individual  by  the  random  variable  u. 
Friedman's  model  is  a  special  case  of  the  general  model  in  which  9  and 
u  are  simply  added  together 

y  =  e  +  u  (1.3) 

Here  9  is  permanent  income  and  u  is  transitory  income.   If  y  is  observed 
for  a  fev7  successive  years,  then  it  is  tempting  to  estimate  permanent 
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income  for  an  individual  as  the  average  income  over  the  years: 

e  =  i  I   y  (1.4) 

t=l 

The  difficulty  is  that  the  distrioution  of  6  among  the  members  of  the 
population  has  more  dispersion  than  the  distribution  of  6.   This  problem 
arises  most  critically  in  Jencks'  data,  where  T  is  1,  but  even  where  T 
is  3  or  4  one  does  not  know  now  much  the  distribution  of  0  tells  about 
the  distribution  of  6.   Wiiat  is  needed,  and  what  this  paper  supplies, 
is  a  method  for  extracting  as  much  reliable  information  as  possible 
about  the  distribution  of  9. 

Oae  of  the  most  carefully  studied  models  of  the  kind  treated  in 
this  paper  is  the  mover-stayer  model  used  by  sociologists  in  analyzing 
industrial  and  other  forms  of  mobility.   Most  mover-stayer  models  posit 
only  tV70  kinds  of  individuals,  movers  and  stayers  (two  values  of  9  in 
the  notation  of  this  paper).   In  these  models,  the  randomness  in 
individual  experience  over  time  governed  by  u  follows  a  Markov  process. 
In  Section  5,  we  indicate  briefly  how  the  methods  of  this  paper  can  be 
applied  to  a  rather  general  version  of  the  mover-stayer  model. 

2 .   Mixtures  of  Probabilities 

Suppose  that  for  an  individual  of  type  9,  the  distribution  of  y 
is  the  vector  of  probabilities  a(9) : 

Prob  [y  =  i|9]  =  a. (9)  (2.1) 

We  observe  the  average  of  this  probability  over  all  individuals: 


(f)^  =  Prob  [y  =  i]  = 


a^(0)dF(6)  ,  (2.2) 

0 


where  F(0)  is  the  cumulative  distribution  of  types  of  individuals  in 
the  population,  that  is,  the  fraction  whose  type  is  lower  than  6.   There 
is  a  substantial  statistical  literature  dealing  with  problems  of  this 
form.   In  the  vocabulary  of  that  literature,  equation  2.2  is  a  mixture. 
The  distribution  a(6)  is  the  kernel  and  F(0)  is  the  mixing  distribution. 
A  survey  of  the  statistical  theory  of  mixtures  appears  in  Maritz  (1970)  , 
Chapter  2.   In  addition,  there  is  an  important  body  of  mathematical 
thought  about  problems  of  the  sort  considered  here.   In  the  mathematical 
literature,  equation  2.2  is  called  a  Tcnebycheff  system  (see  Karlin 
and  Studden  (1966),  Chapters  I  through  V).   It  appears  that  statistical 
and  mathematical  work  in  this  area  has  proceeded  almost  completely 
independently.   The  mathematical  theory  is  substantially  more  general 
and  more  fully  developed,  so  it  forms  the  basis  for  this  paper. 

Our  problem  is  to  obtain  information  about  the  distribution  of 
the  unobserved  component,  F(9)  ,  given  the  observed  probability  (j)  and  the 
known  kernel  a(e).   In  this  section  we  present  theorems  that  give  a 
fairly  precise  characterization  of  the  limits  of  knowledge  about  F(0). 
Most  of  these  theorems  are  simply  re-interpretations  of  results  of  Krein 
(1951)  and  other  mathematical  students  of  Tchebycheff  systems. 

We  begin  with  the 

Assumption  of  Distinct  Types:   The  matrix  [a(e^) , . . . ,a(e  ) ] 
has  rank  ?!  for  any  distinct  set  of  types  6  ,...,8,, 

This  assumption  is  the  defining  characteristic  of  a  Tchebycheff  system. 
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It  rules  out  models  where  the  probabilities  associated  with  one 
particular  type  of  individual  can  be  expressed  as  a  linear  combination 
of  the  probabilities  associated  vjith  M  -  1  or  fewer  other  types.   This 
assumption  does  not  seem  unduly  strong,  and  it  is  satisfied  by  the 
applications  studied  in  this  paper. 

Next  we  define  two  useful  constructions.   First, 

,1 


$  =  {(j)  satisfying  4>  = 


( 

a(0)dF(e)  for  some  F(e)}.       (2.3) 

0 


Here  we  consider  all  F(e)  that  are  non-decreasing,  continuous  from  the 
left,  and  have  a  finite  number  of  discontinuities.   $  is  the  set  of  all 
possible  observed  probabilities  consistent  x\7ith  a  given  problem  as 
defined  by  a(G).   Second, 

.1 


V((fi)  =  {F(e)  satisfying 


( 

a(e)dF(0)  =  <}>}.  (2.4) 

0 


V((())  is  the  set  of  all  distributions  of  unobserved  types  in  the 
population  that  are  consistent  v/ith  a  particular  observed  probability, 
()).   The  essence  of  the  problem  is  that  V((|))  may  contain  a  variety  of 
distributions.   Our  characterization  of  the  limits  of  knowledge  about 
F(e)  deals,  therefore,  with  the  extremal  members  of  V((J)). 

The  first  theorem  establishes  that  no  observed  probability  proves 
that  there  are  more  than  (M  +  2)/2  different  types  in  the  population 
(proofs  and  references  appear  in  the  appendix  to  this  section) : 

Theorem  2.1:   For  any  (()  e  0,  there  exists  a  cdf, 

F(G)  e  V((j))  ,  with  no  more  than  (H  +  2)/2  points  of  increase. 


If  we  let  f.  be  the  mass  at  one  of  the  points  of  increase,  9.,  then  Theorem 

-J  J 

2.1  shows  that  it  is  always  possible  that  i}*  is  a  discrete  mixture: 

N 
*  =  I  f .  a(e  )  (2.5) 

j=l  ^  ^ 

with  N  _<  (M  +  2)/2.   Here  f .  is  the  fraction  of  the  population  having 
type  6..   This  result  appears  independently  in  the  statistical  literature 
on  mixtures  in  the  form  of  an  identification  theorem:   Given  (j) ,  one  can 
calculate  unique  f.  and  6.  satisfying 

N 

I   f  a(e.)  =  <^  (2.6) 

j=l  ^    ^ 

only  if  N  jl  1  +  M/2.   See  Teicher  (1963),  p.  1269. 

The  second  theorem  shows  that  for  any  observed  ^    (with  one  class 
of  exceptions)  we  cannot  rule  out  the  possibility  that  a  positive 
fraction  of  the  population  has  an  arbitrary  type,  9*: 

Theorem  2.2:   Suppose  ^   is  in  the  interior  of  <i>  and 
suppose  9*  is  an  arbitrary  type  in  [0,1].   Then  there  is 
a  cdf,  F(9),  in  V(4i)  x^Tith  positive  mass  p(9*)  at  9*. 

This  result  imposes  a  limitation  on  the  form  of  knowledge  about  F  that 
V7e  can  deduce  from  (}> :   Except  in  borderline  cases,  we  will  never  be  able 
to  state  that  any  particular  type,  or  any  range  of  types,  is  non-existent 
in  the  population.   On  the  other  hand,  p(9''0  may  be  close  to  zero;  the 
ttieorera  does  not  prevent  us  from  finding  useful  bounds  on  the  fraction 
of  the  population  of  a  certain  type  or  range  of  types. 

The  next  theorem  provides  a  bound  on  the  fraction  of  the  population 
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of  type  0*: 

Theorem  2.3:   Consider  the  problem  of  finding  probabilities 
^l'""'^N  ^^^   types  6 J,..., 6.^  obeying 

N 

I    f   a(e  )  =  (}.  (2.7) 

j=l  ^    ^ 

where  6  =  6*  and  N,  6„,  6„  and  J  take  on  one  of  the 
following  sets  of  values:   If  N  is  odd,  either  N  = 
1  +  (M  -  l)/2,  J  =  2,  or  U  =  2  +  (K  -  l)/2,  J  =  4, 
62  =  0,  e^  =  1;  if  :i  is  even.  In'  =  1  +  M/2,  J  =  3,  and 
either  6-  =  0  or  6.  =  1.   Then  this  system  has  a  unique 
solution  and  f  is  the  maximal  mass  at  6*  for  any 
F  e  V((j)). 

Thus  the  problem  of  finding  the  distribution  of  types  that  is  most 

concentrated  at  6'''  is  simply  one  of  solving  a  system  of  U   equations  in 

M  unknowns:   N  values  of  f.  and  N  -  J  +  1  values  of  0 . .   The  solution 

J  J 

is  called  the  canonical  representation  of  (fi  involving  6*. 

A  related  problem  is  to  find  bounds  on  the  fraction  of  the 
population  whose  type  is  less  than  some  value  B*: 

Theorem  2.4  (Markov-Kre in  Theorem) : 

I        f .  £  F(6*)  1    I    f . 

i^e  <e*  -•        196. <e*  -^ 

for  all  F  E  V((t))  (2.8) 

where  f.  and  G.  are  the  canonical  representation 
J      J 

involving  Q*. 
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The  Markov-Krein  theorem  shows  that  the  canonical  representation  is 
extremal  not  only  v/ith  regard  to  the  mass  at  6*  but  also  witli  regard  to 
the  mass  below  0*.   The  upper  and  lower  bounds  on  F(9*)  differ  by  precisely 
the  maximal  mass,  f  . 

Unfortunately,  the  mathematical  theory  of  Tchebycheff  systems  does 
not  provide  bounds  on  the  fraction  of  the  population  betvjeen  two  arbitrary 
types.   We  would  like  to  be  able  to  answer  the  following  question: 
Suppose  we  have  a  pair  of  types  9   and  6  ,  and  we  let  P  =  F(e  )  -  F(e-), 

Li  H  n  Li 

the  fraction  of  the  population  between  9  and  0  .   '^'hat  are  the  largest 

L       n 

and  smallest  values  of  P  consistent  with  a  particular  (J)?  The  Tchebycheff 
inequality  answers  this  question  for  the  particular  case  where  (p   gives 
the  first  two  moments  of  F(9).   There  is  an  extensive  mathematical 
literature  on  generalizations  of  the  Tchebycheff  inequality  (summarized 
in  detail  in  Karlin  and  Studden  (1966) ,  chapters  XII-XIV) ,  but  it  does 
not  contain  any  results  of  sufficient  generality  for  our  purposes. 
Mathematicians  have  been  concerned  exclusively  vjith  sharp  bounds  on  P, 
that  is,  bounds  that  are  attained  by  some  F  e  V(<)))  ,  or  at  least  that  are 
approached  arbitrarily  closely  by  members  of  V((}i). 

Before  going  on  to  our  approach  to  the  problem  of  bounds  on  the 
probability  in  an  interval  on  the  6  axis,  which  involves  non-sharp  hounds, 
we  need  to  deal  with  the  fundamental  problem  of  identif lability .   IJhat 
conditions  are  required  for  it  to  be  possible  to  find  out  anything  about 
the  fraction  of  the  population  in  an  interval?   There  has  been  a  good 
deal  of  work  on  the  identif lability  of  mixtures  (see  Maritz  (1970),  pp. 
20-35),  all  using  a  strict  definition  of  identif lability :   A  mixing 
distribution  is  said  to  be  identifiable  if  its  exact  form  can  be  deduced 
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from  the  value  of  ^.  Strong  assumptions  about  F(6)  are  required  for 
identif iability.  A  leading  result  in  the  statistical  literature  has 
already  appeared  here  as  Theorem  2.1. 

A  much  weaker  notion  of  identif iability  seems  appropriate  in  this 
paper: 

Definition:   The  probability  P  is  identifiable  if  there 
is  some  (j)  in  the  interior  of  *  such  that  V(^)    contains 
no  distributions  with  P  =  0. 

We  gain  information  about  P  if  we  can  show  that  it  is  positive,  that  some 
fraction  of  the  population  has  types  between  0   and  6  .   A  problem,  as 
defined  by  a(9) ,  has  an  identifiable  P  if  there  is  some  observed  outcome 
$  for  which  P  must  be  positive.   It  is  a  remarkable  fact  that  no  additional 
assumptions  are  needed  to  ensure  identif iability  in  a  Tchebycheff  system; 

Theorem  2.5:   Every  P  is  identifiable. 
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Appendix  to  Section  2 

General  remark.  Proofs  of  the  results  in  this  section  are  all 
taken  from  Karlin  and  Studden  (1966)  (hereafter  K  &  S).   They  deal  with 

a  somewhat  more  general  problem  in  which  F(0)  is  not  required  to  obey 

'1 

dF(e)  =  1  and  a(6)  is  not  required  to  satisfy  Ea.(e)  =  1.   In  their 
.  0  1 

exposition, $  is  a  convex  cone,  while  under  our  assumptions  it  is  a  convex 
subset  of  the  unit  simplex.   However,  the  results  invoked  here  apply 
without  modification,  because  our  $  is  simply  the  intersection  of  their 
<I>  and  the  unit  simplex. 

Proof  of  Theorem  2.1:   If  tj)  is  on  the  boundary  of  $,  apply  Theorem 
II. 2.1,  K  &  S.   Otherwise,  apply  their  Corollary  II. 3.1.   If  M  is  odd, 
N  =  (M  +  l)/2. 

Proof  of  Theorem  2.2:  The  appropriate  cdf,  F(0) ,  can  be  taken 
as  defined  in  Theorem  2.3.  K  &  S,  Theorem  II. 3.1,  establish  that  the 
mass  is  positive. 

Proof  of  Theorem  2.3:   K  &  S,  Tneorem  II. 4.1  (attributed  to  Krein 
(1951)),  show  that  the  canonical  representation  involving  9*  assigns 
maximal  mass  to  0*.   Existence  and  uniqueness  of  the  canonical  representation 
follovj  from  their  Theorem  II. 3.1  and  Corollary  II. 3. 2,  respectively. 

Proof  of  Theorem  2.4:   See  K  &  S,  Theorem  III. 2.1. 

Proof  of  Theorem  2.5:   We  need  to  exhibit  a  4)  such  that  all 
F  e  V((fi)  have  positive  mass  in  the  interval  [G  ,e,J  .   Define  9   = 

^  ~  !^  '^  ^  9,  +  ^^—  e„.   If  M  is  odd,  let  N  =  (M  +  l)/2  and 

N      L     rt    ri 

N 


*  =~  I   a(G  ):  (.\2.1) 

■^  k=l   '^ 
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otherwise,  let  N  =  (M  +  2)/2  and 

a(0)  (A2.2) 


N 

I 
k=l 

1 


N+1 

For  this  ^,    the  values  of  6   and  f  =  —  or  -— r  are  a  canonical 
representation.   By  K  &  S's  Lemma  II. 3.1,  every  F(6)  e  V(<}))  assigns 
positive  mass  to  [6-_i.9,J>  so  clearly  P  must  be  positive.   Finally, 
K  &  s's  Theorem  II.  2.1  establishes  that  (j)  is  in  the  interior  of  $. 
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3.   Bounds  for  intervals 

The  theory  of  the  previous  section  gives  bounds  on  the  fraction 

of  the  population  within  a  prescribed  interval  only  when  the  interval 

starts  at  0  or  ends  at  1.   In  this  section  we   discuss  a  method  for 

deriving  bounds  for  an  interval  beginning  at  9   and  ending  at  9  ,  that 

is,  bounds  on  F(e  )  -  F(e^  )  over  all  F(e)  e  V(i}j) .   Our  strategy  is  the 

follov/ing:   We  define  a  set  V  ((}),D  )  that  encloses  V((j))  ;  V„  contains  all 

V  l\  0 

distributions  consistent  with  ^   and  some  others  as  well.   V   is 

mathematically  tractable  and  from  it  we  can  derive  '"outside  bounds"  as 

the  maximiim  and  minimum  of  F(e  )  -  F(e  )  within  V  .   These  are  true 

H        L  0 

bounds  on  the  fraction  of  the  population  between  0   and  0  ,  but  they 

L       H 

understate  the  amount  of  information  available  because  they  are  taken 
over  a  set  that  includes  false  distributions.   We  show  that  as  the  index 
of  computational  effort,  N,  rises,  more  and  more  of  the  false  distributions 
are  excluded  from  V  ,  and  the  bounds  derived  from  it  become  sharper  and 
sharper.   In  fact,  as  N  approaches  infinity,  V  (;J>,D  )  approaches  V(<|)) , 
and  the  bounds  approach  the  sharp  bounds  taken  over  V(<|)) . 

We  also  define  a  set  V  ((j),D  )  that  is  enclosed  by  V((|)).   It 
contains  no  false  distributions  but  excludes  some  true  distributions,  so 
the  "inside  bounds'"  derived  from  it  are  uniformly  too  optimistic.   The 
reason  for  computing  them  is  that  the  difference  between  the  outside  and 
inside  bounds  is  a  measure  of  the  pessimism  of  the  outside  bounds.   Again, 
the  inside  bounds  converge  to  the  exact  bounds  as  N  increases. 

In  constructing  V  and  V  ,  we  make  use  of  a  partition,  D  ,  of  the 
6- axis : 
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D,^  =  {[9p,6^],[e^,62],...,[e,j_.^,e,j]} 


(3.1) 


v/here  0^  =  0  and  6   =  1.   Throughout,  V7e  will  consider  an  arbitrary 
sequence  of  D  s  with  the  properties  that  higher  numbered  partitions  are 
refinements  of  lower  numbered  ones : 


\    ^    °N+1  '  ^1^  " 


(3.2) 


and  that  the  partition  becomes  finer  and  finer; 


.   ,   max   I  -    - 

J:;fj=i....,.l«j-^3-i 


]  =  0 


(3.3) 


Further,  we  require  that  the  9.  include  6   and  0  : 


D^  =  {[0,6^],[e^,6jj],[ej^,l]} 


(3.4) 


We  begin  our  derivation  of  the  outer  bounds  by  defining 


mm 


-  -  a. (9) 

'-■   G.  ,  <  e  <  9. 
J-1  -     -   J 


(3.5) 


max        ._. 
1.  .  =  _  a  (9) 

^'^    0.  ,  <  9  <  9. 
J-1  -    -  J 


(3.6) 


Now 


(e)dF(9)  >  I 


0. 
^*       f   J 


a. 


j=l  ^ 


-i,J 


dF(9)  =     I   a.    . 


J-1 


i  =  l 


i.J  J 


(3.7) 


and 


i.(e)dF(9)  <_   I 

"-  j=l ' 


In   ^  J   _  N 

a.  .dF(9)  =  I   a^    .p. 


i.J 


j-1 


j-1 


i.J  3 


(3.8) 


These  are  the  lower  and  upper  Stieltjes  sums  of  the  integral  with 
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respect  to  U   (see  Apostol  (1957),  p.  203).   Defining  A  and  A  in  the 


oDvious  way,  v/e  nave 

Ap  £  (f  <_  Ap  (3.9) 

In  addition,  we  require  Y.-p .    =   1   and  p.  >  0,  all  i.   The  set  of  solutions 

to  this  system  of  inequalities,  S,((Ji,D  ),  is  a  convex  polyhedron  anu  can 

be  represented  riost  compactly  as  the  convex  hull  of  its  vertices. 

Calculation  of  tiic  vertices  is  discussed  in  tlie  appendix  to  this  section. 

The  set  S^(d),D,)  contains  all  the  probabilities  consistent  v;ith  the 
0    W 

original  probleni  and  possibly  some  others  as  ';-7ell.   Our  next  step  is  to 
compare  the  information  about  F(6)  contained  in  the  computable  S  ((|),D  ) 
with  the  Information  in  the  uncomputable  V((j))  .   To  put  S^((;),D,,)  in  a 
comparable  form,  we  define 

v^(4),D, )  =  {r(e)|F(o.)  -  F(e.  ,)  =  p.  ,  j  =  i,...,n 

for  some  p  e  S^(4),D^,)}  (3.10) 

V  contains  all  the  distributions  that  have  the  appropriate  mass  in 
each  interval.   Then,  from  the  construction  of  V  , 

V(<|))  C  VQ((i>,a^)  ,  any  D,^  (3.11) 

Our  procedure  understates  the  information  available  about  F(9) ,  in  that 
it  suggests  that  some  distributions  are  compatible  with  the  observed 
probabilities  ^   vmen  in  fact  they  are  not.   It  never  makes  the  opposite 
mistake. 

What  are  the  costs  and  benefits  of  using  a  finer  set  of  endpoints? 
The  only  costs  arc  computational;  adding  a  refinement  to  the  endpoints 
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can  never  reduce  the  precision  of  our  bounds:  ^^n^'^'^-^^i^     ^  '*'',-i  ( 't' >  ^  , )  • 
Breaking  an  interval  into  txjo  intervals  not  only  helps  localize  the 
probability  within  the  original  interval  but  refines  the  bounds  on 
other  probabilities  as  well,  by  reducing  the  imprecision  introduced  in 
formulas  3.7  and  3.8. 

Finally,  v/e  show  that  the  true  set  V((j))  can  be  approximated 
arbitrarily  closely  by  using  a  sufficiently  large  set  of  intervals: 

00 

V((i))  =  n  Vq(<!..D^^)  (3.12) 

N=3 

That  is,  if  F(e)  is  not  in  V(({i)  ,  there  is  some  set  of  endpoints,  D  , 
such  that  the  fact  is  revealed:   F(e)  is  not  in  V  ((j),D  )  either. 

The  results  of  this  section  show  that  the  mathematically  simple 

S  ((}),D  )  provides  information  about  F(0)  that  has  a  rigorous  interpretation, 

U     N 

becomes  more  precise  as  the  set  of  endpoints  becomes  more  refined,  and 
converges  to  the  information  in  the  mathematically  intractable  Vi<i>)  . 

Since  the  cost  of  computing  S  ((j),D  )  rises  rather  sharply  v/ith  'A, 
it  is  useful  to  have  information  about  the  amount  of  imprecision  introduced 
by  a  given  partition,  D  ,  to  evaluate  the  prospective  benefits  of  using 
a  finer  partition.   For  this  purpose  we  develop  a  set  of  bounds  that  are 
known  to  be  attained  (and  are  usually  exceeded)  in  V((j)).   These  bounds 
set  a  lovjer  limit  on  the  looseness  of  the  outside  bounds  already 
discussed. 

Among  the  members  of  V(()))  are  some  distributions  that  assign 
probability  only  at  the  points  e^,...,0  .   Such  a  distribution  obeys 


j=l     ^       ^ 
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(3.13) 


where  p.  =  F(e.)  -  F(0._p.   In  matrix  form 

((>  =  Ap  (3.14) 

This  equation,  together  with  the  requirement  that  p  is  non-negative, 

defines  a  set  of  probabilities,  S  ((j),D  ).   Again,  this  is  a  convex 

polyhedron  and  is  fully  characterized  by  its  set  of  vertices.   Every 

vector  p  in  S   corresponds  to  a  distribution  in  V((fi)  that  assigns 

probability  only  at  tae  points  6  ,  ...,9  .   We  define  V  (4),D  )  as  the 

set  of  distributions  corresponding  to  the  set  of  probabilities,  S  (<J),D  ); 

each  probability  contributes  only  one  distribution.   Then  V((j))  encloses 

V  (((>,D  ),  so  the  extremal  members  of  V  meet  our  purpose  of  indicating 
IN  I 

how  closely  the  outside  bounds  can  be  attained.   As  the  partition  becomes 
finer,  V  becomes  richer,  and  ultimately  converges  to  V. 

The  metliods  and  conclusions  of  this  section  are  summarized  in 
the  following  theorem: 

Theorem  3.1;   Let 


V^(<1.,D,.)  =  {F(0)|F(e)  =    I    P, 


for  some  p  e  S  (<|),D  )}  (3.15) 

and 

Vq(*,D,^)  =  {F(e)|F(e^)  -  F(9j_i)  =  V.    , 

j  =  1,...,N,  for  some  p  e  Sq((1),D^^^)  }        (3.16) 
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Then  the  following  three  properties  hold: 

(i)  Enclosure:   V  (<J),D  )  C  V(<t>)  C  V^((t),D„)       (3.17) 

(ii)  Monotonicity:   ^  ((|.,D  )  C  ^j-CfjijO   )  and 

00 

(iii)  Convergence:   fl  V_(((-,D,,)  =  V((|i)  = 

N=3  ^    ^ 

OO 

nV(<^,D)  (3.19) 

N=3 

(The  precise  meaning  of  -  is  explained  in  the  proof.) 

The  final  task  of  this  section  is  to  show  how  to  find  bounds  on 
r  =  F(e  )  -  F(e  )  once  a  suitably  refined  partition,  D  ,  has  been 
selected  and  the  vertices  p    ,...,p    of  S  (<}i,D  )  calculated.   For  any 
F(e)  e  Vq(<|.,D^), 


P  =   I  P.  (3.20) 

JeJ  ^ 

where  J=  (jls   <  6.  <  0}.   Since  P  is  a  linear  function  on  a  polyhedron, 
L  —  J  —  h 

it  attains  its  extreme  values  at  the  vertices.   Thus  we  define 

X  ,  .  .  .  ,IN.   .J   J 


and 


max     r    (k) 


jeJ  -" 


so 


We  can  define  inside  bounds  V^   and  P  by  a  similar  computation  on  the 
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vertices  of  S  (i}),D  ).   Then  from  Theorem  3.1, 
I     N 

lo  -  -?■  -  -I  -  ^I  -  ^  -  ^0  (3.24) 

where  P^  and  P  are  the  exact  bounds  over  V(4)).   Note  that  when  6=0 

or  6„  =  1,  P  and  P  can  be  computed  exactly  by  the  methods  of  Section  2. 
H 
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Appendlx  to  Section  3 

Computing  the  vertices  of  S  (()),D)    Each  vertex  of  S  (<J),D)  is 
a  non  -negative  solution  to 


A 


-A 


I  V 


'               * 

p 

= 

-<l, 

<l^ 

I     J 

[    1 

(A3.1) 


where  v  is  a  vector  of  N  ones  and  ij;  is  a  vector  of  2M  slack  variables. 
Further,  to  be  a  vertex,  no  more  than  2M  +  1  elements  of  p  and  'J/  together 
may  be  nonzero.   Suppose  K  of  the  elements  of  p  are  nonzero  and 
2M  +  1  -  K  of  the  elements  of  ii   are  nonzero.   Then 


A 


0 


A*    -I 


^P^ 


I  ^ 


f 


(A3. 2) 


where  A  contains  the  columns  of  A,  -A  and  v  corresponding  to  the  nonzero 
probabilities,  p,  and  contains  the  rows  of  A  and  -A  for  the  zero  values 
of  the  slack  variables.   The  last  row  of  A  is  v.   A*  contains  the 
remaining  rows  of  A  and  -A.   4*  contains  the  nonzero  slack  variables. 
<}>  consists  of  the  elements  of  (}>,  with  appropriate  sign,  for  rows  with 
zero  slack  variables  and  0*  consists  of  the  remaining  elements  of  4"  and 
-(j).   Since  the  system  is  block-triangular,  it  has  the  recursive  solution 


p  =  A  (j) 


(A3. 3) 


4)  =   A*p 


p  contains  tne  non-zero  elements  of  a  vertex  of  S  ((}),D  )  if  p  >^  0  and 
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ij;  >  0.   The  vertices  of  the  set  of  all  solutions  can  be  calculated  by 
generating  systematically  the  solutions  for  all  possible  choices  of  the 
elements  of  p  and  ij;. 

Proof  of  Theorem  3.1: 
(i)  Enclosure 

(a)  Consider  F(e)  c  V   ((^  ,D   ) .      Then  since  Ap  =  ij), 

1 

a(e)dF(e)  =  <f   and  F(e)  e  V{<^) . 
0 

(b)  Consider  F(e)  e  V(4)) ,   Let  p.  =  F(e.)  -  F(e._  ).   From 
formulas  (3.7)  and  (3.8),  p  e  Sq(((>,D^),  so  F(e)  e  V^(^,U^). 

(ii)  Monotonicity 

(a)  Consider  F(e)  c  V  (({),D  )  and  let  p  ,...,p  be  the 

IN  IN 

associated  vector  of  probabilities.      VJitnout   loss   of 
eeneralitv,    assume   that   D„.,    differs   from  D,  bv   its    0.,   ,. 
Then 

N-1 


I   a(e        )p      +  a(e,^)p^ 


=   (j)  (A3. 5) 


This  shows  that  the  vector 


p=  [p^,.  ..,p^_^,0,p^^^]'  (A3. 6) 


is  in  S^((j),D^_^^),  so  F(e)  e  V^  ((}>,D^_^^) . 

_  N     _  ?^+i 
(b)  Consider  F(0)  e  V^(<f>  »\t+i)  •   ^et  9.   and  6  /    be  the 

N   -N   N+1      -N+1 
points  of  the  two  partitions,  let  A  ,  A  ,  A    ,  and  A 

be  the  corresponding  matrices  calculated  from  a(0) ,  let 

P."  =  F(e.'^')  -  F(-e.^^),  J  =  1,...,N  and  p.'^'-^^  =  7(6.'^^)    ^ 

-   W+1  - 

F(e._   ),  j  =  1,...,N  +  1,  and  let  B  and  B  be  matrices 

N     -K 
obtained  by  duplicating  the  column  in  A  and  A 
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that  corresponds  to  the  interval  containing  the  point  in 

D„,,  but  not  in  D,, .   Nov; 
N+1  N 

A   p    5_  4)  S  A   p    .  (A3. 7) 

i'urtner,  B^  ^  A    and  B  >^  A    by  their  constructions,  so 


Bp''^"^-'"  ±<\>  ±  Bp""*"^  (A3. 8) 


,  ^  .  N+1    ,N  r.         ^    -   N+1    tN  N 
iut  up    =  A  D  and  Ep    =  A  p  ,  so 


A  p  1  <i)  1  A  p  (A3. 9) 


and  p'^  e  S  (c}),D  ).   We  conclude  that  r(0)  e  V  (((i.D  ), 


(iii)  Convergence 

(a)  By  =,  vje  mean 


U     U  V  (v,D  )  =    U   V(.]/)  (A3. 10) 

[■4;-(t>|<£  N=3  Iv-^he 

for   any  e   >  0.      For  some  (j)   on   the  boundary  of   $,   V   (cJijU  ) 

CO 

may  be  empty  for  all  N,  so  it  is  impossible  that  1)  V  ((j),D  ) 

N=3 
V((j))  in  all  cases. 

00 

(1)  Consider  F(e)  e    U     Jv  (il;,D  ).   Then  there 

|ii;-())|<e  N=3 

is  a  ijj  and  an  N*  such  that  |ip-(j)|<e  and  i|)  e  V  (<J;,D  )  for 
N  >  N*.   Thus 


^  =     I   a(§.)(F(e.)  -  F(e._^))  ,  all  N  >  N*     (A3. 11) 

But  the  rignt-hand  side  of  (A3. 11)  converges  to  the 
corresponding  integral  (see  Apostol  (1957),  Exercise  9-4, 


p.    243) ,   so 
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*   = 


,1 


a(e)dF(e) 


'o 


(A3. 12) 


and  F(e)   e  VW. 

(2)    Consider  F(e)    e  U       V(i(j).      Then 

U-<ti|<e 


' 


a(e)dF(e)  -  (t)|  =  n  <  e 


(A3. 13) 


There  exists  a  partition  D.,  such  that 


a(6)dF(e)|    <    e    -  n 


(A3. 14) 


wnere 


it 
^l>  =     I  a(e       )(F(e.)   -  F(6._  )) 

-i=l  J  J  J 


(A3. 15) 


so   F(e)    €   V    (4),D  ).      Now 


k-<j>|   <n  +  e-n  =  E 


(A3. 16) 


so  F(0)  e    U   V^(ijj,D„)  as  required. 

(b)  In  viev/  of  monotonicity ,  we  need  only  shox;  that 

oo 

.}^.   V^(<}>,D^J  CIV(())).   Consider  F(e)  e  VA((j),D  ),  all  N. 
Let 


■i  U 


..(F,D^J  =  I   a\    .p. 


(A3. 17) 


and 


From  the  definition  of  the  integral  (2.2), 


lim  L^(F,D^,)  =  lim  U.(F,D^)  = 
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-  N   N 


"!<■'.»«>  ■  IJi.i^i 


(A3. 18) 


a(e)dF(e) 


(A3. 19) 


But  L.(F,D  )  <_  (})  and  U.(F,D^)  >^  (j) ,  for  all  N,  so 


,1 


a(e)dF(e)  =  4>  , 


(A3. 20) 


and  F(e)  e  ¥(((.). 
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4.   The  Sex  Composition  of  Families 

The  following  example  illustrates  the  nature  of  the  information 
about  the  distribution  of  an  unobserved  component  in  a  simple  case. 
Suppose  that  we  observe  a  large  number  of  apparently  identical  families 
v;ith  two  children,  and  suppose  further  that  a  fraction  ()>  of  the  families 
have  no  girls,  ((>„  have  one  girl,  and  ()>_  have  two  girls.   Eacli  family 
has  a  probability  6  that  a  given  child  v/ill  be  a  girl.   In  terms  of  the 
general  model  given  earlier,  if  y  is  the  number  of  girls  in  a  family, 

y  =  h(e,u) 

=  u,  where  u  is  binomial  of  order  2  with  parameter  9.    (4.1) 
If  all  families  have  the  same  Q,    then  (\)   will  be  the  binomial  distribution: 

(j)^  =  (1  -  6)^;    4>2  =  20(1  -  e);    and  ^^  =   6^        (4.2) 

2 
If  9  varies  among  families,  then  <))  will  be  the  mixed  binomial. 


*1  = 


*2  = 


(1  -  O)^dF(O)  ;  (4.3) 


0 

1 

26(1  -  6)dF(9)  ;  (4.4) 

0 


( 


0^dF(9)  .  (4.5) 


0 


2 
This  possibility  has  been  discussed  in  the  literature  on 

mathematical  demography  (for  example,  Goodman  (1961)  and  Weiler  (1959)). 

This  treatment  of  the  sex  composition  of  families  is  only  an  example  and 

does  not  consider  other  aspects  of  the  problem,  especially  the  effects 

of  the  efforts  of  parents  to  influence  the  composition  through  stopping 

rules.   On  this,  see  Ben  Porath  and  VJelch  (1972). 
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In  our  earlier  notation,  the  kernel  is: 

?    ^ 


a(e)  = 


(1  -  e)' 
2e(i  -  e) 


(4.6) 


Ben  Porath  and  Welch  (1972)  report  the  folloving  distribution  for  the 
sexes  of  the  first  two  cnildrea  of  American  families:  <^^    =   0.262, 
<l>2   =  0.497,  and  (ji,  =  0.241.   The  mean  of  this  distribution  is  0.979, 
suggesting  that  if  G  had  a  single  value,  it  would  be  half  the  mean, 
0.489.   However,  tne  binomial  distribution  with  parameter  0  =  0.489  is 
[0.261,  0.500,  0.239],  wnich  has  somev.mat  less  dispersion  than  the 
observed  ()) .   No  single  value  of  6  can  explain  the  observed  distribution 
of  sexes,  so  we  are  forced  to  consider  a  distribution  of  the  propensity 
to  have  girls,  6,  within  the  population. 

The   theory  of  Tchebycheff  systems  discussed  in  Section  2  focuses 
attention  on  the  canonical  representations  involving  alternative  values 
of  G*,  a  preassigned  type.   Since  M  is  3  for  this  problem,  either  N  =  2 
and  J  =  2,  in  which  case  the  canonical  representation  requires  solving 
for  f  ,  f „ ,  and  6„ ,  or  N  =  3  and  J  =  4,  in  which  case  the  canonical 
representation  requires  solving  a  linear  system  for  f  ,  f „ ,  and  f„.   In 
both  cases  f   is  the  upper  bound  on  the  fraction  of  the  population  that 
has  probability  Q*  of  having  a  girl,  by  Tneorem  2.3.   Further,  from 
Theorem  2.4,  when  N  =  2  and  6*  <  G^,  f^  is  the  upper  bound  on  the  fraction 
of  the  population  with  0  less  tlian  6*,  F(G*)  ;  when  N  =  2  and  Q'''-   >    0„ ,  f 
is  the  upper  bound  on  the  fraction  of  the  population  with  G  at  or  above 
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0*,  1  -  F(e*),  and  vjhen  N  =3,  f-,  +  £2  is  the  upper  bound  on  F(e*)  and 
f,  +  f-  is  the  upper  bound  on  1  -  F(0*).   Table  1  presents  canonical 
representations  for  a  variety  of  values  of  9*.   For  0*  outside  a  short 
interval  enclosing  0.439,  the  canonical  representation  has  only  one 
additional  type,  e„.   Tlie  first  part  of  Table  1  shov7s  a  variety  of 
representations  of  this  kind,   \7hen  6*  is  extreme,  the  representation 
gives  a  low  weight  (f*)  to  0*  and  a  higa  weight  to  a  Q„   that  is  close  to 
0.489.   As  0*  approaclies  0.489,  it  receives  higher  weight  and  the  second 
type,  6   becomes  more  extreme.   At  the  critical  points  0"  =  0.4868  and 
0*  =  0.4923,  0^  reaches  1,  and  we  enter  the  region  where  the  representation 
gives  weight  to  three  values  of  0:   the  two  extremes,  0=0  and  0=1, 

Table  1 

Canonical  Representations  for  the  Jiixed  Binomial 
Model  of  the  Sex  Composition  of  Families 

0*  0„ 


0.0 

.4923 

0.30 

.4968 

0.40 

.5050 

0.48 

.6358 

0.50 

.3571 

0.60 

.4769 

0.70 

.4829 

1.00 

.4868 

=-^=ri:=.=  -=^^.-=^ 

=.-==.-.-===^ 

0* 

f. 

f* 

'2 

0058 

.9942 

0372 

.9628 

1479 

.8521 

9390 

.0610 

9265 

.0735 

1022 

.8978 

0304 

.9696 

0053 

.9947 

f* 

'3 

(0  =  0)  (0  =  1) 

,49  .0034  .9944         .0022 
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and  9  =  6*.   One  such  representation  is  shovm  in  the  second  part  of 

Table  1.   According  to  this  representation,  the  observed  distribution 

of  sex  compositions  could  be  generated  by  a  population  in  which  99  A4/100% 

of  couples  had  a  probability  of  0.49  of  having  girls,  0.34%  had  nothing 

but  boys,  and  0.22%  had  nothing  but  girls. 

From  Table  1  we  can  derive  the  Markov-Krein  bounds  P^  and  P  for 

the  fraction  of  the  population  V7ith  0  between  0  and  0  .   These  bounds 

H 

are  presented  in  Table  2,  along  with  the  outside  and  inside  bounds 
calculated  by  the  methods  of  Section  3.   All  of  the  bounds  agree  that  it 
is  quite  possible  that  no  couple  has  a  probability  of  having  a  girl 
below  0.48  and  also  possible  that  none  has  a  probability  above  0.50  (but 

Table  2 
Bounds  on  the  Fraction  of  the  Population  with  0  betv/een  0  and  6 


'n 

^ 

P 

^I 

^I 

P 

^0 

0.30 

0 

0 

0 

.0236 

.0372 

.0375 

0.40 

0 

0 

0 

.1018 

.1479 

.1500 

0.48 

0 

0 

0 

.7545 

.9390 

.9543 

0.49 

.0033 

.0034 

.0038 

.9853 

.9966 

.9979 

0.50 

.0530 

.0735 

.0742 

1.0 

1.0 

1.0 

.  0.60 

.8720 

.8978 

.8990 

1.0 

1.0 

1.0 

0.70 

.9657 

.9696 

.9697 

1.0 

1.0 

1.0 

Explanation : 
P,  P: 


P   P  • 
-I'   I 


"^larkov-Krein  exact  bounds,  derived  from  Table  1. 

Outside  bounds,  wita  endpoints  0,  .15,  .25,  .30,  .35,  .33, 
.40,  .42,  .47,  .48,  .485,  .49,  .495,  .50,  .52,  .53,  .60, 
.62,  .65,  .70,  .75,  .85,  1.00.   S^  has  2776  vertices. 

Inside  bounds;  same  points  as  above.   S  has  368  vertices. 
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there  raust  be  some  couples  with  probabilities  either  below  0.48  or 
above  0.50).   There  may  be  as  many  as  3.72%  with  probabilities  below 
0.30,  as  many  as  14.79%  below  0.40,  and  as  many  as  93.90%  below  0.43. 
At  least  0.34%  and  possibly  as  much  as  99.66%  of  the  population  have 
probabilities  below  0.49.   At  least  7.35%  have  probabilities  below  0.50, 
at  least  89.78%  below  0.60,  and  at  least  96.96%  below  0.70.   The  upper 
outside  bound  P^  and  the  lower  inside  bound  P^  perform  well  as  approximations 
to  the  sharp  bounds.   P^  does  well  except  at  0.50,  where  it  is  quite 
pessimistic  (although,  of  course,  still  a  true  bound).   P   is  alx^ays  much 
too  small.   Recalculation  of  the  inside  bounds  with  a  finer  partition 
would  remedy  this  problem. 

Table  3  presents  bounds  for  various  intervals  that  do  not  begin 
at  zero.   No  Harkov-Krein  sharp  bounds  are  available  for  these  intervals. 

Table  3 
Bounds  on  the  Fraction  of  the  Population  Tjith  6  between  0   and  8 


\ 

On 

!« 

^I 

h 

^0 

0.30 

0.70 

.9625 

.9697 

1.0000 

1.0000 

0.40 

0.60 

.8500 

.8851 

1.0000 

1.0000 

0.40 

0.50 

0 

0 

.9955 

1.0000 

0.40 

0.48 

0 

0 

.7545 

.9543 

0.48 

0.50 

0 

0 

.9944 

.9954 

0.48 

0.49 

0 

0 

.9853 

.9953 

0.49 

0.50 

0 

0 

.9943 

.9944 

0.50 

0.70 

0 

0 

.9258 

.9470 

For  explanation,  see  Table  2. 
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so  the  only  way  to  judge  the  sharpness  of  the  outside  bounds  is  through 
the  inside  bounds.   At  least  96.25%  of  all  couples  have  probabilities 
between  0.30  and  0.70,  and  we  knov;  that  there  exists  a  distribution 
consistent  with  (J)  in  which  only  96.97%  of  the  population  lies  between 
0.30  and  0.70.   On  the  other  hand,  it  is  possible  that  99.54%  of  the 
population  has  G  between  0.48  and  0.50,  and  we  know  for  sure  that 
99  44/100  %  can  be  in  this  interval. 

A  fairly  wide  variety  of  distributions  of  the  propensity  to  have 
girls  is  consistent  v/ith  the  observed  data  on  the  distribution  of  the 
number  of  girls  among  the  first  two  children.   Although  little  can  be 
done  to  localize  the  distribution  in  the  vicinity  of  0.5,  our  methods 
give  fairly  specific  information  about  the  fraction  of  the  population 
with  extreme  propensities.   The  data  are  not  consistent  with  any 
distributions  with  large  fractions  of  the  population  having  extreme 
values  of  0.   An  increase  in  the  number  of  times  each  unit  is  observed, 
in  this  case  the  number  of  children,  would  refine  our  knowledge  considerably. 
A  study  of  sex  composition  that  examined  more  than  the  first  tv/o  children 
would  need  to  deal  explicitly  with  the  problem  of  stopping  rules,  however. 

5.   Mixed  Markov  Processes  and  the  Mover-Stayer  Model 

This  section  illustrates  the  application  of  the  methods  discussed 
earlier  to  a  problem  of  considerable  interest  in  the  study  of  social 
mobility.   Suppose  there  are  two  states  that  an  individual  may  occupy 
in  each  period;   poor  or  not  poor,  employed  or  not  employed,  lower  class 
or  middle  class,  or  some  other  dichotomy.   Suppose  further  that  a  Markov 
process  governs  transitions  between  the  states:  there  is  a  probability  0 
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that  an  individual  in  the  first  state  in  one  period  will  move  to  the 
second  state  in  the  next  period,  and  a  probability  6  that  an  individual 
in  the  second  will  move  to  the  first.   The  probabilities  of  remaining 
in  the  states  are  then  1-0  and  1-6  respectively,   Models  of  this 
kind  fitted  to  data  on  observed  transitions  of  individuals  under  the 
assumption  that  0  and  5  are  the  same  for  all  of  them  have  suffered  from 
an  important  defect:   They  understate  the  probability  that  an  individual 
will  remain  for  many  successive  periods  in  the  same  state,  even  though 
they  predict  correctly  the  probability  that  an  individual  chosen  at 
random  from  the  inhabitants  of  one  state  will  move  to  the  other  state 
in  the  next  period  (Blumen,  Kogan,  and  McCarthy  (1955)). 

The  mover-stayer  model  resolves  this  paradox  by  assuming  that 
there  are  actually  two  kinds  of  people,  movers,  who  have  positive  6,  and 
stayers,  whose  Os  are  zero.   The  probabilities  of  observed  transitions 
are  the  mixture  of  two  different  Markov  processes.   Methods  for 
estimating  the  parameters  of  the  two  processes  and  the  single  mixing 
probability  have  been  developed  by  Goodman  (1960) .   Recently  Spilerman 
(1972)  has  proposed  an  extension  of  the  model  in  which  the  observed 
probabilities  are  treated  as  the  mixture  of  all  of  the  powers  of  a 
particular  transition  matrix.   None  of  the  literature  on  the  mover-stayer 
model  takes  advaiitage  of  the  statistical  theory  of  mixtures,  however. 

A  natural  generalization  of  the  mover-stayer  model  is  the  mixture 
of  all  Markov  processes.   To  keep  within  the  confiiies  of  the  theory 
developed  in  this  paper,  however,  we  will  suppose  tliat  individuals  differ 
only  with  respect  to  their  probability  of  upv/ard  mobility,  0,  and  that 
6  is  knov7n  and  constant  \/ithin  the  population.   Then  it  is  appropriate 
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to  study  the  distribution  of  the  number  of  spells  in  the  second  state 
over  a  certain  number  of  periods,  T.   Individuals  with  high  values  of  G 
will  tend  to  nave  more  spells  than  do  those  with  low  0.   Ue  define  the 
observed  probability,  ip ,    in  the  followiag  way: 

<))^  =  fraction  of  the  population  with  no  spells 

(J).  =  fraction  v/ith  i  -  1  spells 

(p      =  fraction  with  M  -  1  or  more  spells 

Data  on  spells  of  unemployment  during  a  year  are  reported  by  the  U.  S. 
Census  Bureau  in  precisely  this  form,  with  M  =  4. 

We  define  a. (6)  as  the  probability  of  i  -  1  spells  in  T  periods 
induced  by  a  Markov  process  with  parameters  0  and  6 .   There  is  no  simple 
closed  form  for  a.(0),  but  it  can  be  calculated  from  the  following 
recursion:   Let  Q(t,i,j)  be  the  probability  of  having  i  -  1  spells  in  t 
periods  and  of  finishing  in  state  j  at  time  t.   Then 

Q(t+l,i,l)  =  (1  -  0)Q(t,i,l)  +   90(t,i,2) 

0(t+l,i,2)  =  6Q(t,i-l,l)  -I-  (1  -  6)Q(t,i,2)  (5.1) 
with 

Q(0,i,j)  =0     if  i  5«  1 

Q(0,1,1)  =  p* 

0(0,1,2)  =  1  -  p* 

Q(t,-l,l)  =  0  ,  t  =  1,...,T  (5.2) 

Here  p*  is  the  probability  of  being  in  the  first  state  at  time  0  and 
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might  reasonably  be  taken  as  the  steady-state  probability  of  being  in 
the  first  state: 

Finally, 

a^(e)  =  Q(T,i,l)  +  Q(T,i,2)  ,  i  =  1,...,M  -  1      (5.4) 

a^j^(6)  =  1  -  a^(e)  -  ...  -  a,^j_j^(e)  (5.5) 

This  puts  the  mixture  into  our  standard  form, 

<}.  =   a(e)dF(9)  .  (5.6) 

All  of  our  earlier  techniques  can  be  applied  to  obtain  information  about 
the  distribution  of  the  probability  of  upward  mobility  among  the 
population.   The  mover-stayer  model  is  the  special  case  where  F(0) 
concentrates  all  its  probability  at  9  =  0  and  at  one  other  value  of  6. 
From  Theorem  2.1,  if  our  data  distinguish  only  among  no  spells,  one 
spell,  and  two  or  more  (M  =  3),  then  there  is  always  a  simple  mover-stayer 
model  that  explains  ;:he  observed  ^,   namely  the  canonical  representation 
involving  6*  =  0.   Other  distributions  will  also  be  consistent  with  (j), 
however,  and  if  the  data  on  the  number  of  spells  are  richer,  the  simple 
mover-stayer  model  will  not  generally  be  able  to  explain  (fi.   In  any  case, 
the  assumption  that  there  are  exactly  two  types  of  people  is  a  highly 
restrictive  one;  our  methods  provide  a  workable  method  for  relaxing  it. 
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6.   Extensions 

Many  investigators  are  likely  to  be  willing  to  make  restrictive 
assumptions  about  the  form  of  the  distribution  of  the  unobserved 
component  in  order  to  tighten  the  results  by  ruling  out  implausible 
distributions.   This  can  be  done  through  the  conventional  device  of 
confining  the  distribution  to  a  family  indexed  by  a  limited  number  of 
parameters.   If  the  number  of  parameters  is  equal  to  the  number  of 
observed  probabilities,  then  it  is  often  straightforward  to  calculate 
F(9)  from  (}).   For  example,  if  a(e)  is  binomial  and  F(0)  is  a  beta 
distribution,  then  the  parameters  can  be  calculated  directly  from  <)); 
see  Haritz  (1970),  pp.  22-23.   On  the  other  hand,  a  weak  parametrization 
that  imposes  nothing  more  than  smoothness  on  F(0)  will  usually  have 
more  than  >I  parameters,  so  more  than  one  member  of  the  parametric  family 
of  distributions  v/ill  be  consistent  with  the  observed  ^.   The  problem 
then  is  essentially  similar  to  the  problem  treated  in  this  paper.   In 
particular,  if  the  family  is  linear  in  its  parameters,  the  set  of  parameters 
consistent  with  (j)  is  mathematically  the  same  as  the  set  S   derived  in 
Section  3.   The  family  of  distributions  whose  densities  are  step  functions 
is  an  important  example  of  such  a  family. 

Second,  in  practice  we  do  not  observe  the  probabilities  (j)  but 
only  the  corresponding  frequencies,  say  4).   If  we  apply  our  methods  to 
(}) ,  tnen  our  bounds  become  random  variables  that  estimate  the  bounds  but 
are  not  truly  bounds  themselves.   A  confidence  region  enclosing  <p    induces 
a  confidence  interval  for  each  bound.   The  only  serious  problem  in 
dealing  v;ita  (J)  arises  wnen  it  does  not  lie  in  $.   For  example,  in  a  small 
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population  it  is  possible  that  every  family  has  one  girl  and  one  boy, 
but  there  is  no  mixture  of  binomial  distributions  that  gives  rise  to 
the  corresponding  set  of  probabilies.   Fortunately,  if  ij)  is  in  the 
interior  of  <I>,  the  probability  that  (J:  lies  outside  $  approaches  zero  as 
the  sample  size  increases. 

Third,  in  many  applications  the  probabilities  of  alternative 
outcomes  depend  on  the  observed  characteristics  of  the  individual  as 
well  as  on  his  unobserved  type.   The  easy  way  to  incorporate  this 
dependence  in  our  model  is  to  let  F(6;x)  be  the  distribution  of  6  within 
the  subpopulation  of  individuals  with  characteristics  x.   Then  the 
observed  mixture  also  depends  on  x: 

,1 


c|)(x)  = 


a(e)dF(e;x)  (6.1) 

'0 


Given  <j)(x)  for  a  particular  x,  we  can  then  apply  our  methods  to  derive 
information  about  F(e;x).   In  practice,  we  specify  (("(x)  as  a  multinomial 
probability  depending  on  x  in  a  reasonably  flexible  way,  using  a 
multinomial  logit  or  other  convenient  specification.   Note  that  (f)  (x) 
does  not  have  the  same  structure  as  a(0)--for  example,  the  study  of  mixed 
Markov  processes  does  not  involve  the  estimation  of  the  parameters  of  a 
Markov  process.   From  (l)(x)  ,  we  calculate  bounds  on  F(Q;x)  for  representative 
values  of  x. 

7.   Concluding  Remarks 

Unobserved  differences  among  individuals  are  an  important  source 
of  diversity  in  their  observed  behavior.   For  the  case  in  which  the 
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probability  distribution  among  the  alternatives  is  a  la^own  function  of 
the  unobserved  type,  this  paper  has  sho^-ra  that  exact  but  not  complete 
knowledge  of  the  distribution  can  be  obtained.   The  assumptions  of 
previous  authors  about  these  distributions  can,  in  fact,  be  tested. 
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